We start by loading the required package.
ggplot2 is also included in the
tidyverse package.
library(tidyverse)If not still in the workspace, load the data we saved in the previous lesson.
kings <- read_csv("data_output/kings_plotting.csv", n_max = 54)ggplot2ggplot2 is a plotting package that
makes it simple to create complex plots from data stored in a data
frame. It provides a programmatic interface for specifying what
variables to plot, how they are displayed, and general visual
properties. Therefore, we only need minimal changes if the underlying
data change or if we decide to change from a bar plot to a scatterplot.
This helps in creating publication quality plots with minimal amounts of
adjustments and tweaking.
ggplot2 functions work best with data
in the ‘long’ format, i.e., a column for every dimension, and a row for
every observation. Well-structured data will save you lots of time when
making figures with ggplot2
ggplot graphics are built step by step by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots.
Each chart built with ggplot2 must include the following
Data
Aesthetic mapping (aes)
Geometric objects (geom)
geom_bar), scatterplot (geom_point), line
(geom_line), etc.Thus, the template for graphic in ggplot2 is:
<DATA> %>%
ggplot(aes(<MAPPINGS>)) +
<GEOM_FUNCTION>()
Remember from the last lesson that the pipe operator
%>% places the result of the previous line(s) into the
first argument of the function. ggplot is
a function that expects a data frame to be the first argument. This
allows for us to change from specifying the data = argument
within the ggplot function and instead pipe the data into
the function.
ggplot() function and bind the plot to a
specific data frame.kings %>%
ggplot()aes) function),
by selecting the variables to be plotted and specifying how to present
them in the graph, e.g. as x/y positions or characteristics such as
size, shape, color, etc.kings %>%
ggplot(aes(x = Midyear, y = Reign_duration))add ‘geoms’ – graphical representations of the data in the plot
(points, lines, bars). ggplot2 offers many
different geoms; we will use some common ones today, including:
geom_point() for scatter plots, dot plots, etc.geom_boxplot() for, well, boxplots!geom_line() for trend lines, time series, etc.To add a geom to the plot use the + operator. Because we
have two continuous variables, let’s use geom_point()
first:
kings %>%
ggplot(aes(x = Midyear, y = Reign_duration)) +
geom_point() # basic scatterplotThe + in the ggplot2
package is particularly useful because it allows you to modify existing
ggplot objects. This means you can easily set up plot
templates and conveniently explore different types of plots, so the
above plot can also be generated with code like this, similar to the
“intermediate steps” approach in the previous lesson:
ggplot(kings, aes(x = Midyear, y = Reign_duration)) +
geom_point()+ # basic scatterplot
geom_smooth() # visual trendggplot() function can be seen
by any geom layers that you add (i.e., these are universal plot
settings). This includes the x- and y-axis mapping you set up in
aes().ggplot() function.+ sign used to add new layers must be placed at the
end of the line containing the previous layer. If, instead, the
+ sign is added at the beginning of the line containing the
new layer, ggplot2 will not add the new
layer and will return an error message.kings_plot <- ggplot(kings, aes(x = Midyear, y = Reign_duration)) +
geom_point()+ # basic scatterplot
geom_smooth()+ # visual trend
labs(title = "How long danish kings ruled over time",
x = "Year ", y = "Year they ruled") + # better title and axes' labels
theme_bw() + # cleaner look
theme(text = element_text(size = 14)) # bigger font to make readable ## This is the correct syntax for adding layers
kings_plot +
geom_text(aes(label=Name), size=3)
## This will not add the new layer and will return an error message
kings_plot
+ geom_text(aes(label=Name), size=3) Building plots with ggplot2 is
typically an iterative process. We start by defining the dataset we’ll
use, lay out the axes, and choose a geom:
kings %>%
ggplot(aes(x = Midyear, y = Reign_duration)) +
geom_point()Then, we start modifying this plot to extract more information from
it. For instance, when inspecting the plot we notice that points only
appear at the intersection of whole numbers of Midyear and
Reign_duration.
To colour each village in the plot differently, you could use a
vector as an input to the argument color.
However, because we are now mapping features of the data to a colour,
instead of setting one colour for all points, the colour of the points
now needs to be set inside a call to the
aes function. When we map a variable in
our data to the colour of the points,
ggplot2 will provide a different colour
corresponding to the different values of the variable. We will continue
to specify the value of alpha,
width, and
height outside of the
aes function because we are using the same
value for every point. ggplot2 understands both the Commonwealth English
and American English spellings for colour, i.e., you can use either
color or colour. Here is an example where we
color points by the village of the
observation:
kings %>%
ggplot(aes(x = Midyear, y = Reign_duration)) +
geom_point(aes(color = House), width = 0.2, height = 0.2)There appears to be an increasing trend in reign duration over time, but is rule expectancy growing evenly with time or are houses substantially different?
As you will learn, there are multiple ways to plot the a relationship between variables. A boxplot can be used to plot a distribution of datapoints withing a group, in our case individual reigns within a royal house.
kings %>%
ggplot(aes(x = House, y = Reign_duration, color = House)) +
geom_boxplot() +
theme_bw()
By adding points to a boxplot, we can have a better idea of the number
of measurements and of their distribution:
kings %>%
ggplot(aes(x = House, y = Reign_duration, color = House)) +
geom_boxplot() +
theme_bw()+
geom_jitter(alpha = 0.3,
color = "black",
width = 0.2,
height = 0.2)kings %>%
ggplot(aes(x = House, y = Reign_duration, color = House)) +
geom_violin() +
theme_bw()+
labs(x = "",
y = "Years on the throne")+
scale_x_discrete(labels = function(x) str_wrap(x, width = 20))+
theme(axis.text.x = element_text(colour = "grey20", size = 12, angle = 45,
hjust = 0.5, vjust = 0.5)) Use what you just learned to create a scatter plot of
midpoint of rulers' reign by
duration of length with the house showing in
different colours. Does this seem like a good way to display the
relationship between these variables? What other kinds of plots might
you use to show this type of data?
kings %>%
ggplot(aes(x = Midyear, y = Reign_duration)) +
geom_jitter(aes(color = House),
# alpha = 0.3,
height = 0.2)This is not a great way to show this type of data because it is difficult to distinguish any trend in such a wide spread of dots. What other plot types could help you visualize this relationship better?
::::::::::::::::::::::::::::::::::::::::::::::::::
ggplot2 themesIn addition to theme_bw(), which changes the plot
background to white, ggplot2 comes with
several other themes which can be useful to quickly change the look of
your visualization. The complete list of themes is available at https://ggplot2.tidyverse.org/reference/ggtheme.html.
theme_minimal() and theme_light() are popular,
and theme_void() can be useful as a starting point to
create a new hand-crafted theme.
The ggthemes
package provides a wide variety of options (including an Excel 2003
theme). The ggplot2
extensions website provides a list of packages that extend the
capabilities of ggplot2, including
additional themes.
Experiment with at least two different themes. Build the previous plot using each of those themes. Which do you like best?
Take a look at the ggplot2
cheat sheet, and think of ways you could improve the original
smoothed kings_plot.
Now, let’s look at different themes and make sure you have changed names of axes to something more informative than ‘Midyear’ and ‘Reign_duration’ and add a title to the figure:
kings %>%
ggplot(aes(x = Midyear, y = Reign_duration)) +
geom_jitter(aes(color = House),
# alpha = 0.3,
height = 0.2)+
theme_classic() # try also theme_minimal, and othersWith all of this information in hand, please take another five
minutes to either improve one of the plots generated in this exercise or
create a beautiful graph of your own. Use the RStudio ggplot2
cheat sheet for inspiration. Here are some ideas:
After creating your plot, you can save it to a file in your favourite format. The Export tab in the Plot pane in RStudio will save your plots at low resolution, which will not be accepted by many journals and will not scale well for posters.
Instead, use the ggsave() function, which allows you to
easily change the dimension and resolution of your plot by adjusting the
appropriate arguments (width, height and
dpi).
Make sure you have the fig_output/ folder in your
working directory.
my_plot <-
ggsave("fig_output/name_of_file.png", my_plot, width = 15, height = 10)Note: The parameters width and height also
determine the font size in the saved plot.
ggplot2 is a flexible and useful tool for creating
plots in R.ggplot function.+ operator.